library(plotly)

Attaching package: 㤼㸱plotly㤼㸲

The following object is masked from 㤼㸱package:ggplot2㤼㸲:

    last_plot

The following object is masked from 㤼㸱package:stats㤼㸲:

    filter

The following object is masked from 㤼㸱package:graphics㤼㸲:

    layout
taxi <- fread("train.csv")
head(taxi)
summary(taxi)
      id            vendor_id  pickup_datetime    dropoff_datetime   passenger_count  
 Length:1458644     1:678342   Length:1458644     Length:1458644     1      :1033540  
 Class :character   2:780302   Class :character   Class :character   2      : 210318  
 Mode  :character              Mode  :character   Mode  :character   5      :  78088  
                                                                     3      :  59896  
                                                                     6      :  48333  
                                                                     4      :  28404  
                                                                     (Other):     65  
 pickup_longitude  pickup_latitude dropoff_longitude dropoff_latitude store_and_fwd_flag
 Min.   :-121.93   Min.   :34.36   Min.   :-121.93   Min.   :32.18    Length:1458644    
 1st Qu.: -73.99   1st Qu.:40.74   1st Qu.: -73.99   1st Qu.:40.74    Class :character  
 Median : -73.98   Median :40.75   Median : -73.98   Median :40.75    Mode  :character  
 Mean   : -73.97   Mean   :40.75   Mean   : -73.97   Mean   :40.75                      
 3rd Qu.: -73.97   3rd Qu.:40.77   3rd Qu.: -73.96   3rd Qu.:40.77                      
 Max.   : -61.34   Max.   :51.88   Max.   : -61.34   Max.   :43.92                      
                                                                                        
 trip_duration    
 Min.   :      1  
 1st Qu.:    397  
 Median :    662  
 Mean   :    959  
 3rd Qu.:   1075  
 Max.   :3526282  
                  
sd(taxi.refined$trip_duration)
[1] 659.8239

ggplot(data = taxi, aes(taxi$vendor_id, taxi$trip_duration)) + geom_boxplot(outlier.colour = "red")
ggplotly()
We recommend that you use the dev version of ggplot2 with `ggplotly()`
Install it with: `devtools::install_github('hadley/ggplot2')`

We can see that there are a few outliers with vendor #1. These are exponentially higher than the rest of the trips. Most probably these were due to technical glitches. It would be good to remove them from the analysis.

Seems that this plot is the population itself. The SE overlaps with the mean.

create New

LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQpgYGB7ciBsaWJyYXJpZXN9DQojaW5zdGFsbC5wYWNrYWdlcygiZ2dwbG90MiIpDQojaW5zdGFsbC5wYWNrYWdlcygiZGF0YS50YWJsZSIpDQojaW5zdGFsbC5wYWNrYWdlcygiZHBseXIiKQ0KaW5zdGFsbC5wYWNrYWdlcygicGxvdGx5IikNCiMNCmxpYnJhcnkoZ2dwbG90MikNCmxpYnJhcnkoZGF0YS50YWJsZSkNCmxpYnJhcnkoZHBseXIpDQpsaWJyYXJ5KHBsb3RseSkNCmBgYA0KDQoNCmBgYHtyIGltcG9ydGluZyB0aGUgZGF0YXNldH0NCnRheGkgPC0gZnJlYWQoInRyYWluLmNzdiIpDQoNCg0KYGBgDQoNCg0KDQpgYGB7ciBzdW1tYXJ5fQ0KaGVhZCh0YXhpKQ0Kc3VtbWFyeSh0YXhpKQ0Kc2QodGF4aS5yZWZpbmVkJHRyaXBfZHVyYXRpb24pDQpgYGANCg0KYGBge3J9DQp0YXhpJHZlbmRvcl9pZCA8LSBhcy5mYWN0b3IodGF4aSR2ZW5kb3JfaWQpDQp0YXhpJHBhc3Nlbmdlcl9jb3VudCA8LSBhcy5mYWN0b3IodGF4aSRwYXNzZW5nZXJfY291bnQpDQoNCmBgYA0KDQoNCg0KYGBge3IgcGxvdHN9DQpnZ3Bsb3QoZGF0YT10YXhpLCBhZXModGF4aSRwYXNzZW5nZXJfY291bnQsIGZpbGwgPSB0YXhpJHZlbmRvcl9pZCkpICsgZ2VvbV9iYXIocG9zaXRpb24gPSAiZG9kZ2UiKQ0KDQpgYGANCg0KYGBge3IgdHJpcCBkdXJhdGlvbn0NCmdncGxvdChkYXRhID0gdGF4aSwgYWVzKHRheGkkdHJpcF9kdXJhdGlvbikpICsgZ2VvbV9oaXN0b2dyYW0oYmlud2lkdGggPSAxMDAwKQ0KZ2dwbG90KGRhdGEgPSB0YXhpLCBhZXModGF4aSR2ZW5kb3JfaWQsIHRheGkkdHJpcF9kdXJhdGlvbikpICsgZ2VvbV9ib3hwbG90KG91dGxpZXIuY29sb3VyID0gInJlZCIpDQpnZ3Bsb3RseSgpDQpgYGANCldlIGNhbiBzZWUgdGhhdCB0aGVyZSBhcmUgYSBmZXcgb3V0bGllcnMgd2l0aCB2ZW5kb3IgIzEuIFRoZXNlIGFyZSBleHBvbmVudGlhbGx5IGhpZ2hlciB0aGFuIHRoZSByZXN0IG9mIHRoZSB0cmlwcy4gTW9zdCBwcm9iYWJseSB0aGVzZSB3ZXJlIGR1ZSB0byB0ZWNobmljYWwgZ2xpdGNoZXMuIEl0IHdvdWxkIGJlIGdvb2QgdG8gcmVtb3ZlIHRoZW0gZnJvbSB0aGUgYW5hbHlzaXMuDQoNCg0KYGBge3J9DQp0YXhpLnJlZmluZWQgPC0gc3Vic2V0KHRheGksIHRheGkkdHJpcF9kdXJhdGlvbiA8IDEyNTAwKQ0KDQpgYGANCg0KDQpgYGB7ciBmaWcud2lkdGg9MTIsIGZpZy5oZWlnaHQ9MTR9DQpnZ3Bsb3QoZGF0YSA9IHRheGkucmVmaW5lZCwgYWVzKHRheGkucmVmaW5lZCR0cmlwX2R1cmF0aW9uKSkgKyBnZW9tX2RlbnNpdHkoc3RhdCA9ICJjb3VudCIpICsgZ2VvbV92bGluZSh4aW50ZXJjZXB0ID0gbWVhbih0YXhpLnJlZmluZWQkdHJpcF9kdXJhdGlvbiksIGNvbCA9ICJyZWQiLCBsd2QgPSAxKSArIGdlb21fdmxpbmUoeGludGVyY2VwdCA9IG1lZGlhbih0YXhpLnJlZmluZWQkdHJpcF9kdXJhdGlvbiksIGNvbCA9ICJibHVlIiwgbHdkID0gMSkgKyBnZW9tX3ZsaW5lKHhpbnRlcmNlcHQgPSBtZWFuKHRheGkucmVmaW5lZCR0cmlwX2R1cmF0aW9uKSAtIChzZCh0YXhpLnJlZmluZWQkdHJpcF9kdXJhdGlvbikvc3FydChucm93KHRheGkucmVmaW5lZCkpKSwgY29sID0gImdyZWVuIikgKyBnZW9tX3ZsaW5lKHhpbnRlcmNlcHQgPSBtZWFuKHRheGkucmVmaW5lZCR0cmlwX2R1cmF0aW9uKSArIChzZCh0YXhpLnJlZmluZWQkdHJpcF9kdXJhdGlvbikvc3FydChucm93KHRheGkucmVmaW5lZCkpKSwgY29sID0gImdyZWVuIikgKyB0aGVtZV9saWdodCgpICsgZ2d0aXRsZSgiZGlzdHJpYnV0aW9uIG9mIHRyaXAgZHVyYXRpb24iKSArIHhsYWIoIlRyaXAgRHVyYXRpb24gaW4gc2Vjb25kcyIpDQoNCmBgYA0KU2VlbXMgdGhhdCB0aGlzIHBsb3QgaXMgdGhlIHBvcHVsYXRpb24gaXRzZWxmLiBUaGUgU0Ugb3ZlcmxhcHMgd2l0aCB0aGUgbWVhbi4NCg0KDQoNCmBgYHtyfQ0KdGF4aS5yZWZpbmVkJHBhc3Nlbmdlcl9jb3VudCA8LSBhcy5udW1lcmljKHRheGkucmVmaW5lZCRwYXNzZW5nZXJfY291bnQpDQpwbG90KHRheGkucmVmaW5lZCRwYXNzZW5nZXJfY291bnQsIHRheGkucmVmaW5lZCR0cmlwX2R1cmF0aW9uKQ0KDQpgYGANCg0KDQoNCiMjY3JlYXRlIE5ldw0KYGBge3J9DQoNCg0KYGBg